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Abstract 

Variable selection plays an important role in the high-dimensional data analysis. 
However the high-dimensional data often induces the strongly correlated variables 
problem. In this paper, we propose Elastic Net procedure for partially linear models 
and prove the group effect of its estimate. By a simulation study, we show that 
the strongly correlated variables problem can be better handled by the Elastic Net 
procedure than Lasso, ALasso and Ridge. Based on an empirical analysis, we can get 
that the Elastic Net procedure is particularly useful when the number of predictors p 
is much bigger than the sample size n. 
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1. Introduction 

The high-dimensional data is widely used in medical research, bioinformatics, econo¬ 
metrics etc. It has attracted a lot of interest recently. Variable selection is fundamentally 
important for knowledge discovery with the high-dimensional data and it could greatly 
enhance the prediction performance of the fitted model. Traditional model selection pro¬ 
cedures follow best-subset selection and its step-wise variants. However, best-subset se¬ 
lection is computationally prohibitive when the number of predictors is large, and it is 
unstable. Thus, the resulting model has poor prediction accuracy. To overcome these 
drawbacks of subset selection, statisticians have recently proposed various penalization 
methods to perform simultaneous model selection and estimation. In particular, the Lasso 
(Tibshirani[T0]) and SCAD (Fan and Li [3]) are two very popular methods due to their 
good computational and statistical properties. Efron et al. [2] proposed the LARS al¬ 
gorithm for computing the entire Lasso solution path. Knight and Fu [7] studied the 
asymptotic properties of the Lasso. Fan and Li [3] showed that the SCAD enjoys the 
oracle property. But the oracle property does not hold for Lasso. Then, Zou m proposed 
the Adaptive Lasso (ALasso) by utilizing the adaptively weighted li penalty, which has 
the oracle property. 
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Correlated variables are very important in applications and theory. So it is interesting 
and important to estimate coefficients of the correlated variables. However, the methods 
mentioned above can not deal with the strongly correlated variables perfectly. Zou and 
Hastie m introduced the Elastic Net procedure which can deal with the strongly corre¬ 
lated variables effectively. The essential strongly correlated variables tend to be into the 
model together for the group effect of the Elastic Net. Eurthermore, similar to Lasso and 
Ridge estimation, the Elastic Net procedure has some excellent properties. Thus, it has 
already received a considerable amount of attention. Zou and Zhang m proved the oracle 
property of the Adaptive Elastic Net. Chen et al. [T] showed that the profiled Adaptive 
Elastic Net for partially linear models also has the oracle property. Its estimation identi¬ 
fies the right subset model and has the optimal estimation rate. But little work has been 
done on the highly correlated variables. So we will investigate whether the Elastic Net 
encourages the group effect in partially linear models in this paper. The paper is organized 
as follows. In Section 2, we turn partially linear models into classical linear models by 
the kernel estimation. The Elastic Net procedure for partially linear models is presented 
in this section as well. In Section 3, we discuss the group effect that is caused by the 
Elastic Net penalty for partially linear models. The simulation results comparing Lasso, 
ALasso, Ridge and the Elastic Net are presented in Section 4. Section 5 studies a real 
date example. 


2. Elastic Net procedure 

Partially linear models are a class of commonly-used semiparametric models, which are 
flexible enough and well interpretable, since they contains both parametric and nonpara- 
metric components. Next, we consider the Elastic Net procedure for partially linear models 
and make a further study of its group effect. 

Consider the following partially linear model, 

Y = X'P + f{T) + e, (2.1) 

where X = (xi, • • • , Xp)', /3 = (/3i, • • • , fSp)' is sparse which means that only some compo¬ 
nents are nonzero, and /(•) is an unknown smooth function of the covariate T, e is random 
error with expectation 0 and the standard deviation a, which is independent of (A, T). In 
this paper, we only consider univariate T. Erom (|2.1I) . we have 

/(T) = E{Y\T) - E(A|r)'/3. 

Then 

Y-E{Y\T) = {X-E{X\T)y(3+ e. (2.2) 

Obviously, we can turn the partially linear model into the classical linear model if 
E(A|T) and E(y|r) are known. We estimate mx{T) = E(A|T) and myiT) = E(y|T) 
by the kernel estimation. Suppose a random sample of n individuals is chosen. Let 
X = {X[, X 2 ,X'^y be the design matrix, where Aj = {xii,Xi 2 , ■■■,Xipy, i = 1,2, ...,n. 
Similarly, we assume that Y = {yi,y2, •■•,yn)', T = iTi,T2, ....Tn)', and e = (£1,62, ■■■Xn)'■ 
Moreover, denote the estimators of mx{T) and myiT) by mx(T) and my(T), respectively. 
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where K{-) is a kernel function and h is the bandwidth. Let y* = Vi — myiTi) and 
Xi = Xi — mx{Ti). Then, in matrix notation, (12.2p can be rewritten as 


Y = X'^ + e, (2.3) 

where X = {X[, ...,X^)' and Y = (yi,y 2 , ■■■lyn)'■ So (12.31) is a standard linear model, and 
we may adopt the procedure developed by Zou and Hastie m to study variable selection 
for the partially linear model. 

Definition 2.1 For fixed nonnegative parameters Ai and A 2 , the Elastic Net procedure 
for the partially linear model is defined as follows: 


L(Ai, A 2 , /3) = r - X'/3f + A2||/3f + Ai||/3||i, (2.4) 


p p 

where ||/3||2 = and ||/3||i = |/3,|. 

j=i j=i 

Define 

/3(ENet) = argnnn L(Ai, A 2 ,/3). 

According to the Definition 2.1, the Elastic Net procedure becomes Lasso when A 2 = 0 
in ( 1231 ). By a appropriate transformation, the solution of the Elastic Net procedure can 
be expressed analogously to the solution form of Lasso (Zou and Hastie |14]). Thus we 
can use the least angle regression algorithm (LARS) (Efron et al.[2]) to solve it. 

One of the key issues is the choice of the parameters A„(n = 1,2) and h. Here we fix A 2 
and choose the optimal values of Ai by Cross-validation (Verweij |12]L For the selection 
of bandwidth, its best value is 0(n“^/®). So we find effective bandwidth h for mx{T) and 
my{T) with interpolation technique proposed by Ruppert et al.[9]. 


3. Group effect 

Collinearity is a major obstacle in dealing with high-dimensional data. Eliminating 
collinearity in the determination of the best linear model is a vital subject. In this section, 
we investigate the group effect of the Elastic Net procedure. 
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Theorem 3.1 Assume that the response Y is eentred and the predictor X is standardized. 
Given the data (X, Y) and parameters (Ai, A 2 ), let /3(Ai, A 2 ) he the Elastic Net estimation. 
Assume that /3fc(Ai, A2)/3z(Ai, A 2 ) > 0. Define the group effect Dx.^^x^{k,l) by 

0 = |/3fe(Ai, A 2 ) — -^2)|- (3.1) 

Then 

Orri JX 

^Ai,A2(^,0 < (3.2) 

where m = max {\xik — xfif and fi given by ()3.5p is the predicted residual. 

,n} 

Proof: Since /3fc(Ai, A 2 )A(Ai, A 2 ) > 0, 

sgn{4(Ai, A 2 )} = sgn{/3z(Ai, A 2 )}, 


where sgn{-} is the sign function. 

Let / 3 m(Ai,A 2 ) 7 ^ 0. Note that / 3 (Ai,A 2 ) satisfies 


dL{Xi, X2,/ 3 ). _ 

I/ 3 =^(Ai,A 2 ) - 

Moreover, we have 


L(Ai, A 2 , /3) = ||Y - X'fif + A2||/3f + Ai||/3||i 

* j 3 3 

= fijXij + CY+ ^^Y+ ^^Y 

* 3 3 3 3 


Therefore, 


^Y^ViXik + 2^Xifc^/3j(Ai,A 2 )xij + Aisgn{/3fc(Ai,A 2 )} + 2A2/3fc(Ai,A 2 ) = 0, (3.3) 

i i j 


and 


-2Y^yiXii + 2^Xiz^/3j(Ai, A2)xij + Aisgn{/3i(Ai, A 2 )} + 2A2A(Ai, A 2 ) = 0.(3.4) 

i i j 

By (|3.3I) and (|3.4I) . we have 

/3fc(Ai, A 2 ) - /3z(Ai, A 2 ) = ^ ^(xifc - Xii){yi -Y fij{Xi, X2)xij). 

i 3 


On the other hand, we have 


^ik — ^ik — Xif^ 
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and 


Xii = Xii - rhx{Ti)i = Xii - 




Therefore, 


/3fc(Ai, A 2 ) — A(Ai, A 2 ) 


}_^K{^-—)xjk 2^Ki 


Y- ^ \^ik - Xii - (- 


h 


)Xjl 




h 
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1 I J2j^i \ )\xjk ®i«l. 

< ^ 2^[\xik - Xii\ H- 7 ^ --)|r,: 


A 2 

1 
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] 


< ^ X](l***^ -Xii\ + \xjk - Xji\)\ri 


where 


Xi = Vi 4 (Ai, X2)Xij. 


(3.5) 


Let m = max {\xik — Xii\}. Then we have 

iG{l,2,...,n} 


'^{\Xik -Xil\ + \Xjk - Xjl\)\fi\ < ^ X] I 


A 2 ^ 


that is 


Pk{Xi, A 2 ) — A(Ai, A 2 ) 


2 m v-^ 


□ 

D\^^\^{k,l) describes the difference between the coefficient paths of predictors k and 
L m —>• 0 means Xk and xi are highly correlated. Then the theorem 3.1 suggests that 
the difference between the coefficient paths of predictor k and predictor I is almost zero. 
If /3fc(Ai, A2)/3i(Ai, A 2 ) < 0, we consider the —Xk- The upper bound in (j3.2p provides 
quantitative description for the group effect of the Elastic Net. It can be seen that the 
Elastic Net procedure has the ability to do group selection, but the Lasso fails (Efron et 
al.[2]). 
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Table 1: The mean value of coefficient estimates based on 50 replications. 


Var 

Xl 

X2 

X3 

X4 

X 5 

X6 

x? 

X 8 

Lasso 

-1.81126 

1.69923 

0 

0 

0.59680 

-0.00024 

0.00697 

-0.00095 

ALasso 

-1.86473 

1.76253 

0 

0 

0.63182 

0 

0.00002 

0 

Ridge 

-1.33509 

1.67174 

0 

0.00151 

0.70761 

0.07696 

0.00711 

0.14399 

ENet 

-1.85062 

0.73087 

0.73087 

0 

0.57687 

-0.00032 

0.00036 

- 0.00011 


Table 2: The MSE of the methods based on 

50 replications. 

Methods 

Lasso 

ALasso 

Ridge 

ENet 

MSE 

1.529 

1.601 

1.921 

0.175 


4. Simulation study 

In this section we report a numerical simulation study to compare the Elastic Net 
procedure with Lasso, ALasso, and Ridge. We have known that all the four methods can 
deal with collinearity problems well. However, the last three methods can only select one 
of the highly correlated predictors. As the statement in the theorem, all the necessary 
highly correlated variables can be selected into the model by the Elastic Net procedure. In 
the extreme situation where some variables are exactly identical, the last three methods 
can only select one of the identical variables into the model. But all the identical variables 
can be selected into the model by the Elastic Net procedure. Moreover, it can assign 
identical coefficients to the identical variables. We now demonstrate the above argument 
by the following numerical simulation. 

We generated data from the partially linear model: Y = X'j3 + f{T) + e, where /3 = 
(-2,1,1,0,2/3,0,0,0)', X = (xi,X 2 ,X 3 ,X 4 ,X 5 ,X 6 ,X 7 ,X 8 )', f{T) = with T ~ f7[-l,l], 
and e ~ X(0, 0.04). Moreover, we assume that X 3 = X 2 , X 4 = |a:i + ^X 2 + 5 X 3 + |e, where 
Xi,i = 1, 2,5, 6 , 7, 8 and e follow X(0,1). The kernel function is 


0.' 1-.’ 


We did the simulations for n = 1000 and repeated 50 times by using the software R. 
We considered the Lasso, ALasso, Ridge and the Elastic Net procedure for the variable 
selection. We turned ALasso and Elastic Net procedure into Lasso and estimate coefficients 
by LARS. We picked a value for A 2 , say A 2 = 1/3. We chose the optimal values of the 
parameters Ai by 10 -fold CV. The best value of bandwidth is So we found 

effective bandwidth h for mx{T) and niYiT) with interpolation technique. The coefficients 
estimates are in Table 1. The MSE (mean squared error) are in Table 2, where MSE = 

11/3-/3f. 

Several observations can be made from Tables 1 and 2. The last three methods can 
only select the variable X 2 - Both X 2 and X 3 are selected to the model by the Elastic Net 
procedure. The Elastic Net procedure can assign identical coefficients to the identical 
variables. By using ALasso, we got that xq and xs are out of the model and 3:7 is almost 
zero. The zero components can be eliminated more correctly to the final model by ALasso 
for its oracle property than other methods. The Ridge almost selects all the variables 
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Table 3: Summary of the leukaemia data. 


Method 

Test error 

No. of genes 

Step of selection 

Lasso 

4/34 

26 

29 

ALasso 

3/34 

22 

25 

ENet 

2/34 

51 

60 


into the model. The Elastic Net procedure can select all the highly correlated variables 
into the model accurately. We can see from the results that the Elastic Net procedure 
works better than the other three methods in dealing with the data of strongly correlated 
variables. 

5. Real data example 

A typical microarray data set has thousands of genes and less than 100 samples. Becanse 
of the unique structure of the microarray data, a good variable selection method shonld 
have the following properties: 

(1) Gene selection should be built into the procedure. 

(2) It should not be limited by the fact that p 3> n. 

(3) For those genes sharing the same biological pathway, it should be able to automati¬ 
cally include whole groups into the model once one gene among them is selected. 

Most of popular methods fail with respect to at least one of the above properties (Zou 
and Hastie[T3]). The Lasso is good at (1) but fails to (2) and (3). As an automatic 
variable selection method, the Elastic Net procedure naturally overcomes the difficulty of 
p ^ n and has the ability to do group selection. We nse the lenkemia data to illustrate 
the advantage of the Elastic Net procedure for partially linear models. 

The leukemia data consists of 3571 genes and 72 samples (Golub et al. [6]). In the 
training data set, there are 38 samples, among which 27 are type 1 leukemia (ALL) and 
11 are type 2 leukemia (AML). The goal is to construct a diagnostic rule based on the 
expression level of those 3571 genes to predict the type of leukemia. The remaining 34 
samples are used to test the prediction accuracy of the diagnostic rule. To apply the 
Elastic Net, ALasso and Lasso, we first coded the type of leukemia as a 0-1 response y. 
We did the variable selection by Lasso, ALasso and Elastic Net. The kernel function K{-) 
is the same as in the Sec.4. We used 10-fold CV to select the tuning parameters. 

We stopped the Lasso after 60 steps, ALasso after 30 steps, and the Elastic Net after 
150 steps. Table [3] compares the Elastic Net with Lasso and ALasso. The Elastic Net gives 
the better classihcation, and it has an internal gene selection facility. Figured] displays the 
solution paths and the gene selection results. We get that the number of genes selected 
into the model by Lasso is 26 at step 29, while the ALasso is 22 at step 25. The zero 
components can be eliminated to the final model by ALasso for its Oracle property than 
Lasso. The optimal Elastic Net model is given at step 60 with 51 selected genes. Note 
that the size of the sample is 38, so the Lasso and ALasso can at most select 38 genes. In 
contrast, the Elastic Net selects more than 38 genes, not limited by the sample size. The 
Elastic Net is particularly useful when the number p of predictors is much bigger than the 
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Figure 1: The coefficients paths at each step of Lasso, ENet and ALasso. 
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sample size re. Neither Lasso nor ALasso is a very satisfactory variable selection method 
in the case p S> re. 

6. Conclusions 

Collinearity between variables is a problem we usnally enconnter in high-dimensional 
data. If it can not be handled properly, the accuracy of models we get does not reach 
the standard required and it will affect the interpretability of the models seriously. In 
this paper, we have proposed a more effective selection method. Elastic Net procedure, to 
eliminate the collinearity and select all the strongly correlated variables. The Elastic Net 
procedure for partially linear models produces a sparse model with good prediction accu¬ 
racy, while encourages a group effect. The simulations and empirical results demonstrate 
the good performance of the Elastic Net and its superiority over the other methods. 
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